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Abstract 

Traditional  optical  flow  algorithms  assume  local  image  translational 
motion  and  apply  simple  image  Altering.  Recent  studies  have  taken  two 
separate  approaches  toward  improving  the  accuracy  of  computed  flow:  the 
application  of  spatio-temporal  filtering  schemes  and  the  use  of  generalized 
motion  models  such  as  the  affine  model.  Each  has  achieved  some 
improvement  over  traditional  algorithms  in  specialized  situations.  In  this 
paper,  we  analyze  the  interdependency  between  them  and  propose  a unified 
approach.  The  general  motion  model  we  adopt  characterizes  arbitrary  3-D 
steady  motion.  Under  perspective  projection,  we  derive  an  image  motion 
equation  that  describes  the  spatio-temporal  relation  of  grayscale  intensity 
in  an  image  sequence,  thus  making  the  utilization  of  3-D  filtering  possible. 
However,  to  accommodate  this  complex  motion,  we  need  to  extend  the  filter 
design  to  derive  additional  motion  constraint  equations.  Using  Hermite 
polynomials,  we  design  differentiation  filters,  whose  orthogonality  and 
Gaussian  derivative  properties  insure  numerical  stability.  The  resulting 
algorithm  produces  accurate  optical  flow  and  other  useful  motion 
parameters.  It  is  evaluated  quantitatively  using  the  scheme  established  by 
Barron,  et  al.[4]  and  qualitatively  with  real  images. 
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1.  Introduction 


This  paper  describes  an  algorithm  and  supporting  experimental  results  for  accurate  optical  flow 
and  motion  estimation. 

Research  in  the  field  of  optical  flow,  starting  from  Gibson  [12] , has  spawned  many  algorithms  in 
the  past  two  decades,  and  at  the  same  time  has  led  to  numerous  applications.  To  name  a few,  opti- 
cal flow  can  be  used  to  compute  three-dimensional  motion  and  structure[2]  [43]  ; to  locate  the 
focus  of  expansion  [15]  or  a moving  observer’s  direction  of  heading;  to  segment  independently 
moving  objects[2]  [28]  ; to  detect  motion[9] ; to  stabilize  images [6]  ; to  perform  obstacle  detec- 
tion and  avoidance [3 3]  [45]  [46]  [47]  ; and  to  analyze  medical  video  (2D  echocardiographic 
images)  to  assist  in  diagnosis[8]  . All  of  these  applications  use  optical  flow  data  in  a quantitative 
way.  Although  it  is  true  that  the  optical  flow  field  is  not  necessarily  equal  to  the  motion  field[38] , 
relative  accuracy  in  optical  flow  is  very  important  in  obtaining  qualitative  properties  of  motion. 
For  example,  discontinuities  in  optical  flow  are  useful  qualitative  properties  that  can  be  used  to 
locate  motion  boundaries  more  precisely  if  the  flow  is  more  accurate. 

Evidently,  the  importance  of  accurate  optical  flow  can  not  be  overemphasized.  In  view  of  this, 
Barron,  Fleet,  and  Beaucheniin[4]  developed  a scheme  for  evaluating  optical  flow  algorithms, 
highlighting  the  current  endeavor  to  achieve  greater  accuracy. 

However,  attempts  to  obtain  accurate  motion  estimates  are  impeded  by  three  sources  of  error:  sen- 
sor noise,  brightness  change  over  time,  and  quantization  error.  Sensor  noise  is  caused  by  optical 
or  electronic  irregularities.  Brightness  change  can  occur  in  many  situations,  including  changing 
light  sources,  shadowing,  camera  aperture  adjustment,  or  shading  of  a Lambertian  or  specular  sur- 
face. Quantization  error  is  inherent  in  digital  images.  These  factors  represent  physical  challenges 
that  cannot  be  overcome  by  image  processing  alone  but  can  be  mitigated  by  filtering  techniques. 
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In  addition  to  dealing  with  these  physical  errors,  are  there  other  ways  of  improving  the  current 
best  optical  flow  algorithm?  To  answer  this,  we  need  to  review  other  systematic  difficulties  that 
have  been  facing  us. 

The  first  difficulty  is  the  aperture  problem  or  the  ill-posed  nature  of  the  flow  computation  prob- 
lem. Traditional  optical  flow  algorithms  have  worked  on  finding  reasonable  constraints  to  solve 
this  problem  [3,20,27,32,34,37].  Although  many  ideas  were  proposed,  the  desired  accuracy  was 
not  achieved  due  to  two  factors:  lack  of  attention  to  better  filtering  schemes  and  the  use  of  simple 
assumption  of  uniform  translational  motion.  A good  filtering  scheme  is  essential  in  deahng  with 
the  aforementioned  sources  of  error,  and  uniform  translation  is  insufficient  for  describing  general 
3-D  motion. 

Recent  studies  have  taken  two  separate  approaches  to  improving  accuracy.  The  first  is  the  applica- 
tion of  spatio-temporal  filtering  schemes  [11,19,25,35,41].  The  second  is  the  use  of  generalized 
motion  models  [5,7,8,17,31,35,39,42]  such  as  the  affine  model.  The  fact  that  these  two 
approaches  are  actually  complementary  to  each  other  will  become  clear  as  we  analyze  their  indi- 
vidual advantages  and  disadvantages. 

An  intuitive  idea  for  achieving  better  accuracy  when  applying  a filter  is  to  increase  its  support.  A 
large  support  alleviates  the  aperture  problem,  smooths  out  more  noise,  avoids  aliasing,  and 
reduces  quantization  error  and  truncation  error  of  the  filter.  For  example,  to  estimate  temporal 
derivatives,  recent  research  has  used  multiple  frames  instead  of  simple  successive  temporal  differ- 
ences. In  fact,  more  sophisticated  spatio-temporal  filters,  i.e.,  3-D  filters  (Fig  1.2),  have  been 
developed  to  estimate  image  properties. 

However,  careless  increase  in  filter  support  is  not  adequate  due  to  the  second  difficulty  commonly 
experienced:  lack  of  a good  motion  model.  Since  traditional  algorithms  use  simple  filtering 
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Temporal  difference  or  correlation  3-D  convolution  (filtering) 


Fig  1. 1 Traditional  filtering  approach  Fig  1.2  Spatio-temporal  filtering  approach 

schemes  and  small  filter  supports,  they  can  safely  assume  uniform  translational  motion  as 

described  in  the  image  motion  equation 


I{x,y,t)  = F(x-ut,y-vt)  . (1) 

Once  the  spatio-temporal  filters  are  applied  and  the  support  increases,  the  motion  within  becomes 

more  complicated.  Moreover,  if  we  consider  perspective  projection  of  the  3-D  motion  onto  the  2- 

D image  plane,  the  problem  gets  worse.  For  example,  a forward  moving  observer  sees  a diverging 

scene  in  which  a patch  can  undergo  both  translation  and  expansion  (Fig  2).  Generally,  diver- 


gence, curl,  and  deformation  as  well  as  translation  exist  in  2-D  image  motion.  Unless  the  motion 
model  accommodates  all  these  motion  parameters,  there  is  a limit  to  the  useful  filter  support. 
Recent  research  has  proposed  the  affine  motion  model  to  cope  with  this  difficulty.  However,  once 
the  general  motion  model  is  derived,  one  realizes  that  it  may  not  necessarily  improve  accuracy 

* A patch  centered  at  the  focus  of  expansion  has  expansion  only. 
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because  of  the  new  demand  of  obtaining  additional  parameters.  Even  more  sophisticated  filtering 
techniques  are  required  to  compute  additional  image  properties,  for  example,  one  may  use  higher 
order  derivatives  to  compute  divergence,  curl,  and  deformation  [24] . 

The  above  two  approaches  (spatio-temporal  filtering  and  generalized  motion  model)  have 
achieved  a certain  degree  of  improvement  over  traditional  algorithms.  Interested  readers  may 
refer  to  Section  5 for  more  details  about  these  approaches  and  for  comparisons.  Nonetheless,  the 
interdependencies  between  them  still  set  a limit  to  their  accuracy.  To  answer  the  question  posed 
earlier:  Yes,  we  can  improve  on  the  current  optical  flow  algorithms  by  unifying  a general  motion 
model  and  a spatio-temporal  filtering  scheme. 

A general  image  motion  model  based  on  3-D  relative  motion  has  been  studied  [26] . However,  we 
need  to  extend  the  instantaneous  motion  model  so  as  to  describe  continuous  motion  because  of 
the  intrinsic  requirement  of  spatio-temporal  filtering.  The  continuous  motion  model  is  actually  a 
4-D  model  that  involves  Z,  7,  Z,  r . An  image  motion  equation  based  on  this  model  is  not  tracta- 
ble, especially  considering  the  non-linearity  imposed  by  perspective  projection,  unless  we  make  a 
small  motion  approximation.  It  is  then  clear  that  we  need  a potent  spatio-temporal  filter  design  to 
solve  the  problem. 

The  spatio-temporal  filtering  scheme  we  use  is  based  on  3-D  Hermite  polynomial  differentiation 
filters  [18]  [25] , which  possess  several  advantages:  the  orthogonality  and  Gaussian  derivative 
properties  of  the  filters  insure  numerical  stability;  the  approach  is  generalizable  to  the  higher  order 
derivatives  we  desire;  these  two  properties  make  possible  the  coherent  application  of  multiple  fil- 
ters. Numerous  physiological  models[  14,48]  also  support  the  theory  that  the  visual  receptive  field 
can  be  modeled  by  Gaussian  derivatives  of  various  widths. 

The  pursuit  of  higher  accuracy  is  not  complete  until  we  overcome  the  third  difficulty:  occlusion  or 
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motion  boundary  effects.  That  is  where  accretion  or  deletion  occurs[29]  , and  the  information 
available  to  solve  the  problem  is  reduced.  This  difficulty  is  also  common  to  other  vision  problems 
such  as  stereo  matching.  However,  this  issue  is  beyond  the  scope  of  this  paper  and  will  be  investi- 
gated in  a future  study. 

The  ultimate  goal  of  this  research  is  to  develop  a flexible  set  of  algorithms  that  deals  with  arbi- 
trary 3-D  relative  motion  and  computes  accurate  optical  flow  for  such  applications  as  obstacle 
detection  or  motion  segmentation.  Our  method  is  not  only  capable  of  unifying  the  two  approaches 
attempted  by  recent  research  but  also  results  in  algorithms  whose  output  is  adequate  for  many 
motion  applications.  Its  competitive  performance  is  demonstrated  using  the  evaluation  framework 
established  by  Barron,  et  al.[4] . 

The  remainder  of  the  paper  is  organized  as  follows:  In  Section  2,  we  present  a general  motion 
model  for  arbitrary  3-D  motion  and  derive  an  image  motion  equation.  Section  3 introduces  a spa- 
tio-temporal filtering  scheme  using  3-D  Hermite  polynomial  differentiation  filters,  and  applies 
them  to  the  motion  estimation  problem  based  on  the  image  motion  equation  derived  previously. 
Section  4 provides  implementation  details  with  attention  to  numerical  stability  and  algorithm  flex- 
ibility. Comparisons  to  previous  work  and  specific  contributions  of  this  paper  are  summarized  in 
Section  5.  Section  6 details  the  results  of  quantitative  evaluation  of  our  algorithm  in  comparison 
with  existing  algorithms  cited  in  Barron,  et  al.  [4] . It  also  includes  noise  sensitivity  analysis, 
which  was  not  addressed  by  Barron,  et  al.  In  Section  7 we  present  a qualitative  evaluation  of  the 
results  of  our  algorithm  using  real  images  in  a real  motion  application.  Section  8 concludes  the 
paper. 
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2.  The  General  Motion  Model 


In  this  section,  we  describe  how  the  local  optical  flow  pattern  reflects  arbitrary  3-D  motion  and 
use  this  knowledge  to  derive  a general  motion  model  and  an  image  motion  equation.  Rather  than 
considering  instantaneous  velocity [26]  , we  consider  velocity  over  time  for  the  sake  of  spatio- 

v 7'  ^ 

temporal  filtering.  Let  a 3-D  point  P = {X,Y,Z)  undergo  steady  small  rotation 
(Qp  Qp  translation  (T^,  Ty,  T^)  per  unit  time.  Previous  research  that  deals  with  gen- 

eralizing the  motion  model  has  used  a two-frame  strategy  as  in  [10] , which  may  be  formulated  as 


X 

X 

1 

Qz 

Qy 

Y 

= R 

Y 

+ T,  where  R = 

Q.Z 

1 

and  T = 

Z 

Z 

-Qy 

1 

A. 

Equivalently,  we  write 


X 

X 

1 

Q.Z 

Qy 

T 

^x 

Y 

= M 

Y 

, where  M = 

Q.Z 

1 

-Q.X 

T 
^ Y 

Z 

Z 

-Q.y 

1 

T 

_1_ 

_1_ 

_ 0 

0 

0 

1_ 

is  the  3-D  motion  transformation  matrix. 


(3) 


The  locus  of  a 3-D  point  P (t)  = {X  (t) , Y (t) , Z{t))'^  can  then  be  described  by 


Pit) 

= MM...M 

Pm 

= m" 

1 

o 

1 

1 

1 

is  a polynomial  of  matrices.  If  M were  diagonalizable,  would  be  easily  computed  as 

SA^S  ^ [21] , where  A is  the  diagonal  matrix  composed  of  the  eigenvalues  of  M and  S is  the 
matrix  of  column  eigenvectors.  However,  M has  two  identical  eigenvalues  and  is  not  diagonaliz- 


* In  an  observer-centered  coordinate  frame;  Z is  the  axis  along  the  line  of  sight. 
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able.  Fortunately,  M has  a Jordan  Canonical  Form  SJS  ^ [36]  from  which  can  be  computed 
as  sfs  ^ , where  / has  the  analytical  form  [21] 


J = 


110  0 
0 10  0 
0 0 1-Q.i  0 

0 0 0 1 + a/ 


where  Q 


■ j 


+ Qy  + and  i = 


(5) 


Hence,  f = 


o 

o 

1 

1 t 

0 

0 

0 10  0 

0 1 

0 

0 

0 0 ( 1 - no ' 0 

0 0 

1 - tCli 

0 

0 0 0 ( 1 + no ' 

_0  0 

0 

1 + tO.i 

when  ^2  « 1 . 


(6) 


The  assumption  of  small  rotation,  « 1 , is  also  used  in  [10]  and  most  other  later  studies.  Then, 


m'  = sfs 


1 

tQz 

tQy 

tax  + bx 

1 

tQz 

tQy 

tax 

1 _ 

tQz 

1 

-tQx 

tay  + by 

tQz 

1 

-tQx 

tay 

-tQ.y 

tQ.x 

1 

taz  + bz 

-tQ.y 

tQx 

1 

taz 

0 

0 

0 

1 

0 

0 

0 

1 

(7) 


where  each  of  ay,  a^  is  a function  of  all  of  (Q^,  Qy,  equality 

comes  from  the  intuition  that  = /,  therefore  by,  b^^  must  be  the  residual  error  from  the 
approximation  in  (6)  and  should  be  eliminated  here.  We  may  regard  a^,  ay,  a^  as  translations  in 
the  presence  of  rotation  per  unit  time. 


T 

The  locus  P (t)  = {X(t),Y{t),Z{t))  projects  to  a point  in  the  2-D  image  plane, 
(x(^t),y{t))  .where 


;c(f)  =fX(t)/Zit) 
y{t)  =fY(t)/Z(t) 


, where  / is  the  focal  length.  Let 


= fX/Z 
vJo  = /y/Z 


. Then 


(8) 
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(9) 


f(ta^  + X-  ^0  ~ ■'■  ^^y/) 

X (t)  = = 

“ tO-yX  + tQ.^Y  + Z ta^/Z - t^lyX^  + 

f(taY+  tO.^^  +Y-  tQ.^Z)  fitUyf/ Z + ^^2^0 

y (t)  = = 

ta^  - rQyZ  + tQ.^Y  + Z ta^/Z  - t^lyX^  + tQ.-^yQ  +f 

Note  that  an  instantaneous  velocity  derived  in  [26]  is  a special  case  of  our  formulation,  namely, 
the  velocity  (m,  v)  atr  = 0: 


Note  that  the  flow  is  generally  quadratic.  Computing  optical  flow  based  on  the  uniform  translation 
model  is  far  from  adequate,  and  the  affine  motion  model  is  only  valid  when  there  is  no  rotation  in 
the  X and  Y directions. 

To  derive  an  image  motion  equation  in  the  form  of  (1),  we  start  with  the  brightness  constancy 
equation: 


I{xit),y{t),t)  =/(^Q,yQ,0).  (11) 

Without  loss  of  generality,  let  F (jCq,  y^)  = I {Xq,  Jq,  0)  . It  suffices  to  find  jcq,  Jq  in  terms  of 


» which  will  be  denoted  by  x,y  for  simplicity.  The  resulting  solution  is  extremely 
complicated,  but  assuming  small  rotation  and  small  3-D  translation  relative  to  distance,  namely, 
a^,  ay,  a^«Z,  we  have 


a. 


x + —X  + Q.^y  - 


— + Q. 
Z ^ 


a, 


1 + j^iClyX  - Q^y) 


yo^ 


y + f[  -n2X  + —y-fi^  — -n 
1 


(12) 


Equation  (12)  can  be  further  simplified  by  using  the  approximation  jQ.y  x,  y « 1 , as  fol- 
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lows: 


^0  = + ^zy  “Xy  ^ ~ ) 

>0  = (y  + Y>'  Y “ ^x)))(  ^ ) 

XQ<=x  + t{^-^x  + il^y  -/(-#+  j “ 'i  ) 


(13) 


(14) 

yo  = y + {-n^x  + ^y  -/(y  - ^x))  - ^X>’^) 

The  above  approximation  is  justified  by  the  following  facts.  First,  the  value  of  / in  pixel  units  is 
usually  large.  For  example,  for  a 256x256  image  with  a field  of  view  of  45  degrees,  / is  309  pix- 
els. Second,  since  we  are  concerned  with  motion  in  a relatively  small  image  local  neighborhood, 
so-called  pointwise  analysis,  x,  y,  t are  small.  Third,  a small  rotation  in  the  X and  Y direction  in 
3-D  space  can  be  approximated  in  the  2-D  image  plane  as  a simple  translation.  Since  we  are  inter- 
ested in  finding  optical  flow  rather  than  3-D  motion  and  structure,  we  do  not  lose  much  accuracy 
here.  The  error  from  this  approximation  will  be  absorbed  by  the  translation  parameters 

thus  offsetting  the  optical  flow  error.  Inherent  3-D  motion  ambiguities  related  to  this  were 
described  in  [1]  [44] . We  will  also  use  the  above  arguments  for  further  simplification  in  our  algo- 
rithm development  (Section  4.2). 

Now  the  image  motion  equation  is,  from  (11)  and  (14), 


I{x,y,t)  = + yx  + py -H  + exyj,  y + - px  + yy  + 6xy  + ey^JJ,  (15) 

wherea  = + = Y’ ^ ^ ~ 

We  need  to  develop  a filtering  scheme  to  relate  all  the  above  motion  parameters  to  the  3-D  filter 
output  and  then  solve  them  in  order  to  estimate  the  optical  flow,  which  is  (-a,  -p)  at 
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;c  = j = r = 0 from  (10)  and  (16).  Note  that  these  motion  parameters  are  closely  related  to  3-D 
motion. 

To  demonstrate  the  behavior  of  the  image  motion  equation,  consider  the  following  basic  motion 
patterns: 

1.  When  there  is  no  rotation,  and  no  translation  in  the  Z direction,  then 
y=p  = 6 = e = 0 (16),  and  there  is  uniform  image  translational  motion,  as 
assumed  by  traditional  algorithms.  (Fig  2.1) 

2.  When  there  is  no  rotation,  p = 5 = 8 = 0 , hence  the  image  motion  is  affine  without 
rotation,  i.e.  with  only  expansion  and  translation.  (Fig  2.2) 

3.  When  there  is  no  translation  in  the  Z direction,  7=0  and  no  rotation  in  the  X and  Y 
direction,  6 = 8 = 0 , the  image  undergoes  translation  and  rotation.  (Fig  3.1) 


Note  that  the  images  shown  are  merely  enlarged  local  neighborhoods  to  reveal  their  particular 
behaviors  under  motion. 


4.  When  there  is  no  rotation  in  the  X and  Y direction,  5 = 8 = 0,  the  image  undergoes 
affine  motion  without  deformation,  i.e.  only  with  translation,  expansion  and  rotation. 
(Fig  3.2) 


Fig  3.1  Translation  and  rotation 


5.  An  arbitrary  3-D  motion  generates 


an 


Fig  3.2  Affine  motion  without  deformation 
image  motion  like  Fig.  4. 
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In  summary,  the  image  motion  equation  is  based  on  expedient  and  reasonable  approximations.  It 
is  applicable  not  only  to  the  algorithm  developed  here,  but  also  to  other  motion  algorithms, 
although  the  extent  of  improvement  depends  on  the  particular  algorithm.  For  the  gradient-based 
method,  the  filtering  process  is  the  decisive  factor  as  far  as  performance  is  concerned.  We  formu- 
late the  theory  of  Hermite  polynomial  differentiation  filters  in  the  next  section. 


3.  Hermite  Polynomial  Filters 

3.1  Hermite  polynomials 

The  71th  Hermite  polynomial  (x)  is  a solution  of 


”_2x— ” + 27iH  = 0 
dx^ 


The  (x)  are  derived  by  Rodrigues’  formula  [18] 


(17) 


H„{x)  = 

dx 


(18) 


The  computation  of  (jc)  is  especially  easy  using  the  following  recursive  relations: 
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^f„  + i W = -2nH„_j(A:)  (19) 

HqW  = 1 
Hj  (x)  = 2x 

2 -X 

By  substituting  G {x)  (with  variance  o ) for  e in  (18),  we  generalize  to  Hermite  polynomials 
with  respect  to  the  Gaussian  function.  Let  these  Hermite  polynomials  be  denoted  by  (jc) 

H„(x)  = {-l)"G~\x)-^iG{x))  (20) 

dx 

Note  that  (jc)  differs  from  (x)  by  a scaling  product: 


= 


1/2  J “nl  1/2 
2 ^2  G 


(21) 


The  scalar  product  of  two  functions  and  the  L2-norm  of  a function  with  G (x)  as  a weight  function 


are  defined  as  follows: 


{a,  b)  = j G{x)  a (x)  b (;c)  dx  and  ||(3||  = 

— oo 

The  orthogonality  of  (;c)  } can  be  expressed  in  the  following  way[18]  : 

(H„,H„)  = (22) 

The  3D  case  of  Hermite  polynomials  is  especially  simple  because  they  are  separable.  Thus  the 
polynomial  with  order n =■  i -¥  j + kis 

Hijkix,y,t)  = Hiix)  'Hj(y)  'Hk(t)  (23) 

3.2  Derivation  of  gradient  constraint  equations 

One  of  the  most  important  properties  of  Hermite  polynomials  is  the  property  of  Gaussian  deriva- 
tives. It  is  with  the  aid  of  this  property  that  we  are  able  to  establish  gradient  constraint  equations. 
This  property  is  manifested  in  the  following  theorem. 
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Theorem  1:  A one  dimensional  signal  I{x)  can  be  expanded  in  terms  of  Hermite  polynomials  as 


/w  = I/; 


Hkix) 


k = 0 


Hi 


Then  4 = {I,Hk)  = where  Ho(^)  = 1 and/f*^)  = 


k 

dl 

dx^ 


(24) 


The  proof  is  given  in  Appendix  A.  This  theorem  states  that  the  ^th  order  Gaussian  derivative  of 

the  image  is  the  inner  product  of  the  image  and  the  kth  order  Hermite  polynomial  (x)  . Note 
that  the  theorem  is  true  only  for  unnormalized  Hermite  polynomials.  This  fact  is  used  when  we 
assign  weights  to  motion  constraint  equations  of  different  orders  in  equation  (35). 


Recall  our  image  motion  equation  (15), 

/ {x,  y,t)  = a + yx  + py  + bx^  + 8xy J,  y + ?(^  P - pjc  + yy  + 6xy  + sy' 

Expand  both  sides  with  Hermite  polynomials, 


oo  oo  oo 


i = 0j  = 0k  = 0 


oo  oo  oo 


ijk 


— = I I I ‘iJk  = ^ijk>  = Pijk  = Hijk)  (25) 


Hijk 


i = Oj  = Ok  = 0 


fiijk\ 


Equating  I to  j and  using  Theorem  1,  we  derive 


hn  = ^in  = 


\x. 


2 ^3F  f 2^3F  

= <[a  + 7A:  + py  + 6j:  +SJr>'J— + 1 P-px  + Yy  + 5xy H-ey 

“o  °yo 


= ((^a  + YA:  + Xp  + 5j:^  + ej:>>]|^,//yo)  + ((p  - + "O' + Wyo) 

We  make  the  above  approximation  because  equations  (9)  and  (16)  allow  us  to  derive 


(26) 


dF  didx  didy  bl  , ^ x x 


(27) 
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Since 


dF 


dXr 


X = 0,y  = 0,  t = 0 


K 

dx 


X = 0,y  = 0,  t = 0 


and  the  inner  product  is  Gaussian  weighted,  the 


error  is  not  significant.  Besides,  without  the  approximation,  the  eventual  constraint  equation 

dF 

would  be  non-linear  and  very  difficult  to  solve.  The  analysis  is  similar  for  — . Therefore,  we 


arrive  at 


1 = < [ « + yj:  + py  + j|^.  Hijo)  + ([  P - p j;  + yv  + j|^^yo)  • (28) 

/••I  = a Hyo)  + Y(|^,  xH^jo)  + P yHijo)  + 5 x%jo)  + e xyH^jo)  (29) 

+ p (^,  Hijo)  - p xHijo)  + ySyo)  + 5 xyHijo)  + e {^,  y%jo) 

To  simplify  (29),  we  derive  from  (19)  and  (21)  the  following  equation 

xH^  (x)  = + 1 (x)  + nH^  _ i (x)  . Hence  (30) 

= a{^,  %)  +7<|^,  + iH,-.  yo>  + p{|^,  (j^Hy^i.o  +yHy-  i,o)  (31) 

+ 5(|^  , a%^2jo  + (2/  + 1) o^Hyo  + i(i  + 1) ^,-2;0> 

Hi  ^ \j  + \^Q  i(5  F[i_ij  j(5  + ly  - 1,  0 ly  - 1,  o) 

+ p , ffyo)  - P (^  . + UO  + iHi  -ijQ}+  y{^  , o%j  ^ 1, 0 +jHij.  1,  o> 

+ t{^,O%j^2.0+  i2j+  l)0%jo+jU+  l)Hy.2,0> 

Using  Theorem  1,  we  simplify  (31)  to 
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(32) 


As  stated  in  the  introduction,  this  is  a fundamental  element  resulting  in  a coherent  spatio-temporal 
filtering  scheme  to  compute  optical  flow.  This  capability  stems  from  two  nice  properties  of  (32). 
The  first  is  the  linearity  of  the  equation  in  terms  of  the  motion  parameters  as  defined  in  (16);  the 
second  is  its  extensibility  to  higher  orders,  i.e.,  the  values  of  i,j  can  be  as  large  as  required  by  the 
number  of  parameters.  Thus  to  solve  for  the  motion  parameters  and  then  for  optical  flow,  we 
derive  a system  of  linear  equations  with  coefficients  computed  from  spatio-temporal  Alters  . 

These  result  in  excellent  numerical  stability  due  to  the  orthogonality  of  the  Hermite  polynomial 
differentiation  Alters  and  their  inherent  Gaussian  smoothing.  Although  (32)  appears  to  be  compli- 
cated, it  in  fact  suggests  a simple,  local,  and  parallel  algorithm,  which  involves  only  convolutions 
and  solving  a linear  system,  as  presented  in  the  next  section. 

4.  Algorithms  for  Computing  Optical  Flow 

Equation  (32)  gives  rise  to  not  a specific  algorithm  but  a set  of  algorithms,  due  to  its  extensibility. 
We  can  derive  the  same  number  of  equations  as  unknowns  and  solve  a linear  system,  or  we  may 
incorporate  more  equations  of  higher  order  and  solve  a linear  least  square  problem.  On  the  other 
hand,  if  we  possess  knowledge  about  the  input  image  sequence,  for  example,  that  there  are  no 
rotations  around  certain  directions,  the  number  of  motion  unknowns  can  be  reduced.  We  also 
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make  other  expedient  choices  based  on  numerical  considerations  and  experimental  findings.  All 
these  options  are  explored  in  the  following  subsections. 


4.1  The  general  algorithm 

According  to  (32),  we  derive  six  equations  up  to  the  third  order.  Within  a 3-D  local  window,  we 

estimate  } with  the  discrete  approximation  { (x,  y,t)  },  that  is,  the  3-D  convolution  of  the 

sampled  Hermite  polynomial  filters  with  the  image  sequence,  and  write  the  equations  in  matrix 
vector  form: 


M^s  = c , where  = (M^  0 where  0 means  concatenation,  and  (33) 


Y ^ 2 - - 

_ — 

^001 

^100  AlO  ^ (A00  + A20)  0 

a 

A ^ 2 ^ ^ ^ ^ 

p 

Aoi 

hoo  Aio  ^ (A00  + A20)  +^100  ~Aio 

Y 

, c = 

^011 

^110  A2O  ^ (A1O  + A30)  +A10  ^100 

P 

^201 

^ 2 ^ ^ ^ ^ 

Aoo  AlO  ^ (/400  + A20)  + 2/200  “2/iio 

6 

8 

All 

Alo  A20  ^ (Alo + A30)  + 2/110  Aoo“A2o 

/021_ 

^120  A30  ^ (/220  + A40)  + 2/020  2/110 

(34) 


4 ^ ^ 2 ^ 

^ (-^300 + -^120)  IlOO 


4 ^ /V  2 ^ 

^ (^210  ^030)  ^010 


^ (1400  + 1220)+^  (3/200  "^^020)  +2^000  ^ (/3IO  ^130)  + ^ 2/110 


^ (^310'*'A3o)  2/110 


4 ^ ^ 2 ^ ^ ^ 

^ (/220  '''^040)  ( 3/020  -^200)  + 2/000 


^ (^500  + ^320)  (5/300  + 2/120)  +6/100  C7  (/410  + /230)  +^  3/210 

^ (/4IO +/230)  + ^ (4/210  + ^30)  + 3/010  ^ (/320  + A4o)+^  (4/i20  + ^30o)  + 5/100 


^ (/32O  + l050)  + ^ 3/120 


4 ^ ^ 2 ^ ^ 

^ (^230  + l050)  + ^ (5/o30  + 2/210)  + 6/010 


Note  that  filter  outputs  of  up  to  fifth  order  are  used,  s can  be  solved  exactly  by  (33)  for  the  center 


pixel  of  the  3-D  window  and  optical  flow  at  the  center  of  the  local  window  is  (-a,  -(3)  . 
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4.2  Specialized  algorithms 

The  algorithm  presented  in  the  previous  subsection,  i.e.,  solving  a full-rank  linear  system,  is  often 
an  overkill  for  many  image  sequences.  There  are  several  disadvantages  in  using  it  for  simple  mo- 
tion: First,  it  requires  the  use  of  higher  order  filters,  whose  orthogonality  is  always  distorted  in  a 
limited  local  support;  and  the  use  of  a large  local  support  is  undermined  by  its  susceptibility  to 
motion  boundary  effects.  Second,  the  linear  system  is  often  highly  ill-conditioned,  since  the  col- 
umns of  the  matrix  are  of  different  orders  of  magnitude.  Third  is  the  higher  demand  in  computa- 
tion. 


Let  the  focal  length  be  large  enough  and/or  the  rotation  in  the  X,  Y direction  small  enough  for  6 
and  £ to  be  negligible.  Then  M2  = 0 and  we  have  a linear  least  square  problem: 


E = Z?||  where  = WM^,b  = Wc,  and 

W = Diag  [wp  W2,  W3,  W4,  W5,  Wg]  . (35) 

W is  the  diagonal  weight  matrix  for  the  motion  constraint  equations.  According  to  (25),  we  use 


1 

-2 

-2 

_ 

-2  1 

1 

to 

_ 

t-2 

1 

Fool 

1 

to 

II 

^^101 

II 

1 

Hon 

II 

1 

^201 

1 

II 

^^111 

1 

ON 

II 

^02 1| 

Note  that  this  formulation  reduces  the  highest  order  filter  to  fourth  order.  In  addition,  a least 
square  solution  is  more  stable  than  a full-rank  linear  system.  The  applicability  of  the  weight  ma- 
trix is  another  nice  feature.  We  suggest  solving  (35)  by  QR  decomposition: 


^4  = QR  ^ E = min\\QRs  + b\\  = min  |r5  + bl , where  Q is  unitary. 


(36) 


R can  be  denoted  by 


0 


H, 


, where  R is  an  upper  triangular  matrix;  and  Q b is 


, correspond- 


ingly. Equation  (36)  then  becomes 
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E = minqR^s  + bJI^  + lbJl) 

= 1^7^  = r if  is  not  singular.  (37) 

The  solution  s is  computed  from  R^s  + = 0 (38) 

For  all  practical  purposes,  the  above  algorithm  is  adequate  for  computing  optical  flow.  Nonethe- 
less, for  many  synthetic  images  or  synthesized  real  images  [4] , there  is  still  room  to  simplify  the 
algorithms  and  improve  the  stability  of  the  linear  system.  For  example,  in  image  sequences  where 
= 0 , we  get  p = 0 and  (35)  reduces  to 

E = m/72||A353  - ^1 , (39) 

where  A3  and  are  the  first  three  columns  and  elements  of  A^  and  , respectively. 

Furthermore,  the  third  column  in  A3  involves  higher  order  terms  plus  a lower  order  term.  Experi- 
ments suggest  that  the  lower  order  term  is  always  dominant  and  more  accurate.  Neglecting  the 
higher  order  terms  does  not  necessarily  degrade  the  accuracy  but  does  save  a great  deal  of  com- 
puting time.  This  finding  is  very  crucial  especially  for  real-time  implementations. 

In  terms  of  computing  efficiency,  our  algorithm  is  excellent  due  to  the  separability  of  the  3-D  Her- 
mite  polynomial  filter  design.  Let  the  3-D  filter  size  be  W xW  x W and  image  size  be  5 . The 

complexity  of  the  computation  is  O {{W^+W^  + W^+  C)  S)  , instead  of  O ( {W^W^Wj  + C)  S)  , 

where  C is  the  constant  factor  associated  with  solving  the  hnear  system.  In  addition,  the  above 
process  can  run  on  all  image  pixels  in  a parallel  fashion.  It  can  achieve  maximum  speedup  run- 
ning on  a CREW  (Concurrent  Read  Exclusive  Write)  parallel  machine  [22] . 

One  of  the  advantages  of  using  the  QR  decomposition  is  the  availability  of  the  matrix  R^  and  the 

residual.  The  behavior  of  R^  and  the  residual  value  reveal  plenty  of  information  about  the  under- 
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lying  images  and  motions.  There  are  certain  situations  where  the  optical  flow  cannot  be  reliably 
computed  from  local  information  due  to,  for  example,  the  aperture  problem.  Therefore,  and 

the  residual  should  be  investigated  for  their  possible  link  to  the  reliability  of  the  computed  optical 
flow.  We  devote  the  next  subsection  to  the  discussion  of  the  optic  flow  errors  and  confidence  mea- 
sures. 

4.3  Confidence  measures 

Our  algorithm  provides  ample  information  about  the  behavior  of  the  system  equations.  It  includes 
the  residual  r , the  condition  number  and  the  determinant  of  . They  can  be  shown  [25]  to  signi- 
fy certain  image  phenomena,  e.g.,  occlusion,  the  aperture  problem,  etc.,  which  present  difficulties 
for  optical  flow  computation.  Therefore,  they  can  be  utilized  to  locate  high  error  areas  and  suggest 
subsequent  improvement  methods.  For  the  sake  of  the  evaluation  in  Section  6,  we  simply  use 
them  as  confidence  measures  or  threshold  values  to  extract  more  accurate  data,  notwithstanding 
the  fact  that  they  can  be  used  for  qualitative  image  analysis. 

4.3.1  Residual 

The  residual  of  our  algorithm  is  + ^||  or  r (=E)  (37).  The  residual  of  an  overdetermined  lin- 
ear system  indicates  the  degree  to  which  the  equations  disagree  with  one  another.  The  reason  for 
the  existence  of  the  residual  lies  in  the  approximation  error  of  { {x,  y,t)  }.  A high  approxima- 
tion error  may  indicate  one  of  three  problems: 

1.  The  assumption  of  the  motion  model  is  violated  in  the  3-D  window  It  is  possible 
that  the  window  covers  more  than  one  moving  object.  Occlusion  or  multiple  indepen- 
dently moving  objects  in  a window  can  cause  this  problem. 

2.  The  assumption  of  constant  image  brightness  is  violated.  It  is  not  unusual  for  the 
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brightness  of  an  object  to  change  when  the  viewing  angle  changes  due  to  relative 
motion.  In  addition,  the  observing  camera  may  adjust  the  brighmess  gain  for  different 
scenes,  resulting  in  a change  of  object  brightness.  Similar  effects  can  be  caused  by 
the  shadow  of  another  object. 

3.  Quantization  and  truncation  errors.  Quantization  errors  result  from  sampling  Hermite 
polynomial  filters.  Truncation  errors  are  introduced  when  we  use  limited  spatial  sup- 
port to  compute  { (;c,  y,t)  }.  Within  a small  window,  the  Hermite  polynomials  are 

no  longer  orthogonal  and  the  derivatives  computed  are  not  accurate.  This  situation  is 

—2/2 

worse  for  higher  order  differentiation  filters.  For  example,  H^)/c  n\  (22)  is 
0.93  when  n = 5 and  0.999945  when  n = 1 for  a window  size  of  21. 

We  can  model  the  above  errors  as  perturbations  to  the  linear  system  [25] : 


E - m/n||  + AO  5 + {b-¥  AZ?)  || , where  N and  AZ?  denote  errors  and  n<6 

We  prove  in  Appendix  B the  following  analytical  results: 


A5  = 5-5~-l  A^(A5  + AZ7) 


r = 


(40) 


(41) 

(42) 


Note  that  the  expressions  for  both  optical  flow  error  A5  (41)  and  residual  r (42)  are  proportional 

to  the  size  of  the  noise  vector  {Ns  + /^b)  . It  is  evident  that  locations  with  high  residual  reflect 
large  errors  and  inaccurate  optical  flow  values. 

Note  that  the  three  problems  mentioned  above  suggest  contradictory  choices  for  the  window  size. 
With  larger  windows,  problems  1 and  2 may  be  aggravated;  with  smaller  windows,  problem  3 


* It  may  be  deceptive  to  claim  that  the  residual  is  proportional  to  the  optical  flow  error  because  the  error  vec- 

f j A-l  j f T T 

tor  IS  mapped  by  different  matrices  (I  I A^  , I-A\A^  A^j  A^  ),  so  the  error  also  depends  on  the 

orientation  of  the  noise  with  respect  to  the  matrices. 
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becomes  worse. 


43.2  Condition  Number  and  Determinant 

The  condition  number  of  denoted  by  K , is  defined  as  ||  ||^J^||  and  can  be  shown  to  be 


1^1 


max 


, where  the  ^ ’s  are  eigenvalues  or  diagonal  elements  of  . 


A condition  number  measures  the  extent  to  which  a linear  system  maps  an  input  error  into  an  out- 
put error,  or  in  brief,  the  numerical  instability  of  the  system.  If  s contains  errors  magnified  by  an 
ill-conditioned  from  errors  in  b,  ii  is  not  reliable.  Since  matrix  is  concerned  with  the 

n n 

image  texture  only  and  not  with  motion,  we  find  correspondences  between  a high  condition  num- 
ber and  the  following  two  image  neighborhood  situations: 

1.  when  there  is  a steep  edge  in  the  x(y)  direction  (Fig  5.1),  so  that  the  derivatives  are 
very  large  for  jc(y)  and  small  for  y(x); 

2.  when  there  is  a lack  of  texture  in  a direction  (the  x + y direction  in  Fig  5.2),  so  that  the 
derivatives  in  the  x direction  are  approximately  proportional  to  the  derivatives  in  the 
y direction,  i.e.  I (x,y)  - 1 (kx  + y)  . 


Fig  5.1  Smoothed  steep  edge.  Fig  5.2  Lack  of  texture  in  x+y  direction. 


The  above  two  situations  can  easily  be  confirmed  by  inspecting  the  QR  decomposition  process. 
The  determinant  of  R^  is  the  product  of  all  its  eigenvalues.  In  solving  (38)  or  5 = -Rj^  • b^ , the 
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determinant  plays  an  important  role  in  the  matrix  inverse.  Since  we  use  the  QR  decomposition 
method,  Q is  unitary  (orthonormal  projection)  so  the  behavior  of  is  similar  to  the  original  . 

A small  determinant  of  indicates  one  of  the  following  two  situations: 

1.  The  three  columns  of  are  close  to  being  linearly  dependent.  This  is  the  same  as  the 

second  situation  in  the  above  discussion  of  condition  number.  In  fact,  a small  deter- 
minant due  to  linear  dependency  also  causes  a high  condition  number. 

2.  All  the  elements  of  are  very  small.  This  corresponds  to  a uniform  brightness  area, 
e.g.  a cloudless  sky. 

If  there  is  motion  in  the  area  where  one  of  these  two  situations  dominates,  then  it  corresponds  to 
what  is  known  as  the  aperture  problem.  As  noted  before,  the  above  situation  corresponds  to  the 
general  case  of  the  aperture  problem.  It  is  interesting  to  note  that  Barron,  Reet,  and  Beauchemin 
[4]  recognize  empirically  that  the  determinant  is  a better  confidence  measure  in  the  application  of 
the  liras,  et  al.  [37]  optical  flow  algorithm  than  the  condition  number  used  in  the  original  paper. 
Our  analysis  agrees  with  their  empirical  finding. 

4.3.3  Integration  of  confidence  measures 

Based  on  the  above  analysis,  we  choose  a combination  of  confidence  measures  according  to  the 
nature  of  a given  image  sequence. 

If  the  image  sequence  contains  numerous  moving  objects  or  the  brightness  changes  significantly, 
residuals  should  be  used  as  the  confidence  measure,  as  they  capture  the  three  problems  listed  in 
Section  4.3.1.  No  other  confidence  measure  is  effective  for  these  cases. 

The  condition  number  and  determinant  have  something  in  common  although  they  may  capture 
different  situations.  Together  they  signify  the  relationship  between  numerical  instability  and  the 
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aperture  problem.  Empirical  findings  suggest  that  they  be  used  as  a multiplicative  combination,  or 
similarly,  in  the  form  of  P5] . This  has  been  proposed  by  Girosi  et  al.[13]  in  a similar  con- 
text and  was  used  in  Barron’s  implementation  [4]  of  Lucas  and  Kanade’s  optical  flow  algorithm. 
In  our  algorithm,  simply  indicates  the  existence  of  the  necessary  texture.  We  shall  use  it  to 

capture  the  aperture  problem  and  to  avoid  numerical  instability. 

All  the  above  mentioned  problems  are  not  unique  to  our  algorithm;  they  are  common  to  other  op- 
tical flow  algorithms  as  well. 

5.  Previous  Work  and  Our  Contributions 

Recent  research  in  the  field  of  optical  flow  seems  to  converge  on  two  ideas  to  be  discussed  in  this 
section.  They  are  spatio-temporal  filtering  and  generalized  motion  models. 

An  earlier  method  based  on  these  two  ideas  has  appeared  in  [35]  by  Srinivasan.  In  this  approach 
to  a generalized  gradient  method  for  optical  flow,  the  author  concentrated  on  generalizing  spatio- 
temporal  filtering.  He  demonstrated  his  algorithms  on  various  types  of  motion.  However,  the 
algorithms  did  not  deal  with  motion  that  simultaneously  contained  translation,  expansion,  and 
rotation.  In  fact,  it  is  stated  that  “erroneous  results  can  occur  if  a translatory  motion  is  superim- 
posed upon  the  rotation  or  expansion”.  Nonetheless,  [35]  is  one  of  the  first  efforts  in  generalizing 
the  optical  flow  algorithms. 

Later,  Workhoven  and  Koenderink  [42]  introduced  the  idea  of  the  affine  flow  field  (43)  to  esti- 
mate optical  flow: 


“o 

u 

V (x)  = T + Ax  where  T = 

and  A = 

X y 

V V 

L ^ 3d 
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A series  of  algorithms  [5,7,8,31,39]  using  this  more  comprehensive  flow  field  followed. 

Based  on  an  infinitesimal  affine  flow  field,  both  Workhoven  and  Koenderink  [42]  and  Nagel  [31] 
used  Taylor  series  expansion  and  2-D  Gaussian  derivative  filters  to  derive  motion  constraint  equa- 
tions. These  equations  are  organized  in  a linear  system  in  a similar  way  in  both  studies.  Their 
work  can  be  regarded  as  an  extension  of  Horn  and  Schunck’s  work  [20] . Our  work  is  inspired  by 
Workhoven  and  Koenderink’ s algorithm  because  an  extensible  motion  constraint  equation  similar 
to  (32)  was  developed  in  [42] , though  only  in  2-D.  However,  the  affine  model  is  not  based  on  the 
pointwise  3-D  motion  analysis  so  their  motion  equation  fails  to  recognize  the  dependencies  of  the 
first  order  motion  parameters,  i.e.,  y = u = v and  p = m = v . This  causes  numerical  insta- 

X y y X 

bility.  Hence  their  approach  does  not  offer  an  algorithm  with  competitive  experimental  results.  In 
fact,  our  implementation  of  their  algorithm  shows  that  it  is  not  reliable. 

Campani  and  Verri  [7] , Bergen  et  al.  [5] , and  Wang  and  Adelson  [39]  used  local  flow  field 
coherence  rather  than  the  Taylor  expansion  to  compute  flow.  Their  work  can  be  regarded  as  an 
extension  of  Lucas  and  Kanade’s  work  [27] . They  do  not  demand  high-order  gradients  but  need 
to  perform  patchwise  computation.  Patchwise  computation  is  accurate  when  the  motion  has  been 
segmented  but  inaccurate  otherwise  due  to  its  strong  susceptibility  to  the  aperture  problem. 

Chou  [8]  modeled  the  error  in  the  affine  flow  field  as  independent  Gaussian  noise  and  used  Max- 
imum a Priori  (MAP)  estimation  to  minimize  the  error  and  compute  optical  flow.  We  have  shown 
in  (10)  that  the  error  modeled  in  [8]  consists  of  exactly  the  quadratic  terms.  It  is  actually  system- 
atic and  dependent  on  motion.  Therefore,  this  noise  model  is  not  adequate. 

Prior  to  the  affine  flow  model.  Hartley  [17]  had  proposed  a quadratic  flow  field  model  and  used 
pyramid  linking  to  estimate  and  segment  flow  simultaneously.  This  is  the  first  use  of  the  “correct” 
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motion  model  in  an  algorithm.  The  integration  of  estimation  and  segmentation  is  a valuable  les- 
son for  future  research,  but  the  lack  of  temporal  or  even  spatial  filtering  to  deal  with  noise  is  its 
weakness. 

We  realize  that  modeling  a fiow  field  is  essentially  a 2-D  process,  whereas  modeling  motion  is  a 
3-D  process,  which  is  relatively  difficult.  However  we  can  impose  temporal  smoothing  in  an  inte- 
grated theoretical  framework  based  on  Hermite  polynomials. 

Heeger  [19] , Fleet  and  Jepson  [11] , and  Weber  and  Malik  [41]  also  achieved  success  using  spa- 
tio-temporal filters.  However,  they  all  used  a uniform  translational  motion  model  and  their 
improvements  are  limited  by  this  assumption.  Among  them.  Fleet  and  Jepson  attempted  to  cope 
with  non-translational  motion  in  [11] . They  showed  that  the  phase  response,  instead  of  the  ampli- 
tude response,  of  the  velocity-tuned  filters  is  robust  to  image  affine  transformations  and  photomet- 
ric deformation.  Their  algorithm  is  based  on  constant  phase  contours  and  tends  to  produce  more 
accurate  but  sparse  flow  fields. 

If  the  above  methods  could  take  advantage  of  the  image  motion  equation  (15),  which  deals  with 
arbitrary  3-D  motion,  greater  improvements  might  be  achieved.  However,  these  methods  might 
have  difficulties  with  the  spatial  nonlinearity  (specifically,  quadratic)  and  the  number  of  parame- 
ters involved.  Hermite  polynomial  filters,  on  the  other  hand,  have  proved  to  be  capable  of  over- 
coming these  difficulties. 

From  a theoretical  point  of  view,  the  image  motion  corresponding  to  arbitrary  3-D  motion  has 
been  studied  by  Longuet-Higgins  and  Prazdny[26]  and  Fang  and  Huang  [10] . We  have  pushed 
the  effort  forward  not  only  by  integrating  temporally  continuous  analysis  but  also  by  exploring 
numerical  implementations.  The  figure  below  summarizes  the  thread  of  work  leading  to  our  algo- 
rithm. An  arrow  in  the  figure  represents  an  idea  extracted,  extended  or  used  similarly. 
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At  the  application  level,  our  algorithm  generates  a set  of  confidence  measures  that  we  prove 
reflect  physical  phenomena  about  the  image  and  motion.  These  measures  can  then  be  used  for 
subsequent  qualitative  processing.  In  experiments,  our  algorithm  generates  accurate  and  dense 
results,  which  are  very  useful  for  such  tasks  as  motion  segmentation  and  obstacle  detection. 

In  summary,  the  contributions  of  this  work  are  a general  motion  model  that  lends  itself  to  use  with 
any  good  spatio-temporal  filtering  methods  for  estimating  accurate  optical  flow,  and  a potent  Her- 
mite  polynomial  theory  for  motion  analysis. 


6.  Experiments 


Based  on  the  work  of  Barron,  Fleet,  and  Beauchemin[4] , we  conducted  extensive  comparisons 
between  our  algorithm  and  traditional  optical  flow  algorithms,  including  those  by  Horn  and 
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Schunck[20] , Lucas  and  Kanade[27] , liras  et  al.  [37] , Nagel[32] , Anandan[3] , Singh[34] , 
Heeger[19] , Waxman  et  al.  [40] , and  Fleet  and  Jepson[ll] . The  synthetic  image  sequences  we 
used  for  comparison  are  Sinusoid,  Translating  tree,  Diverging  tree,  Yosemite  fly-by  (provided  by 
Barron),  and  Moon  landing. 

If  the  image  sequences  used  contain  only  translational  (Sinusoid)  and  diverging  motion  (Translat- 
ing tree.  Diverging  tree,  and  Yosemite  fly-by),  we  use  the  algorithm  in  (39);  if  the  image  motion 
also  contains  rotation  (Moon  landing),  we  use  the  algorithm  in  (35). 

The  error  statistic  utilized  is  the  angle  error  between  the  computed  optical  flow  time-space  direc- 
tion v^,  1)  and  the  ground  truth  flow  time-space  direction  {u^,  v^,  1)  averaged  over  the 

whole  image.  Refer  to  [4]  for  more  details.  In  order  to  make  extensive  comparisons,  we  imple- 
mented our  algorithm  in  such  a way  that  a certain  density  of  output  flow  can  be  extracted  by 
thresholding  on  a chosen  confidence  measure.  Error  statistics  in  the  following  subsections  are  dis- 
played in  tables.  For  a single  technique  with  multiple  entities  in  these  tables,  different  threshold 
values  are  used  in  the  algorithm  to  produce  multiple  densities  of  output.  For  the  actual  threshold 
values  of  the  comparison  algorithms,  refer  to  Barron  et  al.[4] . The  error  statistics  and  associated 
density  for  the  comparison  algorithms  were  obtained  directly  from  [4] . 

6.1  Sinusoid 

This  is  a synthetic  image  sequence  (Fig  6)  of  a spatial  sinusoidal  wave  traversing  toward  the  up- 
per right  side.  For  our  method  we  chose  a window  size  large  enough  (17x17x7  for  x,y,t)  to  prevent 
aliasing.  |l/r|  was  used  as  the  confidence  measure.  Fig  7.1  shows  the  true  optical  flow,  while 
Fig  7.2  shows  the  flow  computed  with  our  method.  Our  algorithm  performs  better  than  all  of  the 
other  algorithms  except  Fleet  and  Jepson’s  (Table  2). 
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Fig  6.  Traversing  sinusoid 


Fig  7.1  True  optical  flow  for  sinusoid  Fig  7.2  Computed  optical  flow  (100%) 

Table  2:  Summary  of  Sinusoid  error  statistics 


Density 

Our  Algorithm 

Other  Algorithms 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Technique  by 

100% 

0.63° 

0.06° 

4.19° 

0.50° 

Horn  & Schunck  (original  unthresholded) 

2.55° 

0.59° 

Horn  & Schunck  (modified  unthresholded) 

2.47° 

0.16° 

Lucas  and  Kanade  (unthresholded) 

2.59° 

0.71° 

Uras  et  al.  (unthresholded) 

2.55° 

0.93° 

Nagel 

30.80° 

5.45° 

Anandan 

2.24° 

0.02° 

Singh  (step  1 unthresholded) 

0.03° 

0.01° 

Beet  and  Jepson 

12.8% 

0.63° 

0.06° 

64.26° 

26.14° 

Waxman  et  al. 

6.2  Translating  tree  and  Diverging  tree 

The  translating  and  diverging  tree  sequences  are  two  realistic  synthetic  sequences  simulating  the 
motion  of  simple  translation  (Fig  8.1)  and  expansion  (Fig  8.2),  respectively,  of  a poster.  The  win- 
dow size  used  in  our  method  is  19x19x11  for  the  translating  tree  and  17x17x9  for  the  diverging 
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tree.  Due  to  the  lack  of  texture  in  some  background  areas,  we  used  as  the  confidence  mea- 

sure. Fig  9 and  Fig  10  show  the  results.  Only  liras’  and  Fleet  and  Jepson’s  algorithms  perform 
better  than  ours  for  the  translating  tree  sequence  (Table  3).  For  the  diverging  tree  sequence,  our 
results  are  second  only  to  Reet  and  Jepson’s  (Table  4). 


Fig  8.1  Translating  tree 


Fig  8.2  Diverging  tree 


Fig  9.1  True  flow  for  Translating  tree 


Fig  9.2  True  flow  for  Diverging  tree 


Fig  1 0.1  Computed  flow  for  Translating  tree 
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Table  3:  Summary  of  Translating  tree  error  statistics 


Density 

Our  Algorithm 

Other  Algorithms 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Technique  by 

100% 

0.92“ 

0.94° 

38.72° 

27.67° 

Horn  & Schunck  (original  unthresholded) 

2.02° 

2.27° 

Horn  & Schunck  (modified  unthresholded) 

0.62° 

0.52° 

liras  et  al.  (unthresholded) 

2.44° 

3.06° 

Nagel 

4.54° 

3.10° 

Anandan 

1.64° 

2.44° 

Singh  (step  1 unthresholded) 

1.25° 

3.29° 

Singh  (step  2 unthresholded) 

99.6% 

0.91° 

0.92° 

1.11° 

0.89° 

Singh  (step  2) 

74.5% 

0.69° 

0.51° 

0.32° 

0.38° 

Fleet  and  Jepson 

53-57% 

0.59° 

0.39° 

32.66° 

24.50° 

Horn  & Schunck  (original) 

5.63° 

2.78° 

Heeger  (level  1) 

1.89° 

2.40° 

Horn  & Schunck  (modified) 

49.7% 

0.57° 

0.37° 

0.23° 

0.19° 

Fleet  and  Jepson 

44.2% 

0.55° 

0.34° 

8.50° 

13.50° 

Heeger  (level  0) 

40-42% 

0.53° 

0.33° 

0.46° 

0.35° 

liras  et  al. 

0.72° 

0.75° 

Singh  (step  1) 

0.66° 

0.67° 

Lucas  and  Kanade 

26.8% 

0.48° 

0.28° 

0.25° 

0.21° 

Fleet  and  Jepson 

13.1% 

0.42° 

0.24° 

0.56° 

0.58° 

Lucas  and  Kanade 

1.9% 

0.35° 

0.19° 

6.66° 

10.72° 

Waxman  et  al. 

Table  4:  Summary  of  Diverging  tree  error  statistics 


Density 

Our  Algorithm 

Other  Algorithm 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Technique  by 

100% 

1.84° 

1.33° 

12.02° 

11.72° 

Horn  & Schunck  (original  unthresholded) 

2.55° 

3.67° 

Horn  & Schunck  (modified  unthresholded) 

4.64° 

3.48° 

liras  et  al.  (unthresholded) 

2.94° 

3.23° 

Nagel 

7.64° 

4.96° 

Anandan 

17.66° 

14.25° 

Singh  (step  1 unthresholded) 

8.60° 

4.78° 

Singh  (step  2 unthresholded) 

99% 

1.82° 

1.28° 

8.40° 

4.78° 

Singh  (step  2) 

73.8% 

1.59° 

1.12° 

4.95° 

3.09° 

Heeger  (combined) 

60-61% 

1.49° 

1.02° 

0.99° 

0.78° 

Fleet  and  Jepson 

8.93° 

7.79° 

Horn  & Schunck  (original) 

3.83° 

2.19° 

Uras  et  al. 

46-48% 

1.40° 

0.92° 

2.50° 

3.89° 

Horn  & Schunck  (modified) 

0.80° 

0.73° 

Reet  and  Jepson 

1.94° 

2.06° 

Lucas  and  Kanade 

28.2% 

1.28° 

0.79° 

0.73° 

0.46° 

Fleet  and  Jepson 
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Table  4:  Summary  of  Diverging  tree  error  statistics 


Density 

Our  Algorithm 

Other  Algorithm 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Technique  by 

24.3% 

1.24“ 

0.77“ 

1.65“ 

1.48“ 

Lucas  and  Kanade 

3.9-4.9% 

1.09“ 

0.66“ 

13.69“ 

11.83“ 

Waxman  et  al. 

5.62“ 

6.16“ 

Singh  (step  1) 

6.3  Yosemite  fly-by 

The  Yosemite  Fly-by  sequence  is  a realistic  synthetic  image  sequence  (Fig  11).  The  flight  scene  is 
simulated  using  actual  aerial  photos  and  digital  terrain  maps,  with  artificial  sky  and  clouds.  Since 
the  clouds  in  the  sky  change  brightness  over  time,  it  poses  difficulties  for  all  algorithms.  Based  on 
our  previous  analysis,  we  used  |l/r|  as  the  confidence  measure  to  eliminate  points  that  lie  in  a 
large  blank  area  in  the  sky  and  on  motion  boundaries  in.  Fig  12.2.  shows  the  results.  Since  the  mo- 
tion is  rather  fast  in  some  areas,  we  used  a larger  window  (21x21x7).  Error  statistics  are  shown  in 
Table  5.  Again,  the  clouds  account  for  the  large  magnitude  error.  Our  algorithm  performs  better 
than  all  others. 


Fig  11 . Yosemite  fly-by  image 
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Fig  12.1  True  optical  flow  field  for  Yosemite  fly-by  Fig  12.2  Computed  optical  flow  for  Yosemite  fly-by 


Table  5:  Summary  of  Yosemite  fly-by  error  statistics 


Density 

Our  Algorithm 

Other  Algorithms 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Technique  by 

100% 

7.13° 

13.19° 

31.69° 

31.18° 

Horn  & Schunck  (original  unthresholded) 

9.78° 

16.19° 

Horn  & Schunck  (modified  unthresholded) 

8.94° 

15.61° 

Uras  et  al.  (unthresholded) 

10.22° 

16.51° 

Nagel 

13.36° 

15.64° 

Anandan 

15.28° 

19.61° 

Singh  (step  1 unthresholded) 

10.44° 

13.94° 

Singh  (step  2 unthresholded) 

97.7% 

6.39' 

6.39° 

10.03° 

13.13° 

Singh  (step  2) 

64.2% 

2.99° 

4.54° 

22.82° 

35.28° 

Heeger  (level  0) 

59.6% 

2.85° 

4.15° 

25.33° 

28.51° 

Horn  & Schunck  (original) 

44.8% 

2.57° 

3.50° 

15.93° 

23.16° 

Heeger  (combined) 

33-35% 

2.41° 

3.32° 

4.28° 

11.41° 

Lucas  and  Kanade 

4.63° 

13.42° 

Fleet  and  Jepson 

5.59° 

11.52° 

Horn  & Schunck  (modified) 

6.06° 

12.02° 

Nagel 

30.6% 

2.38° 

3.24° 

5.28° 

14.34° 

Fleet  and  Jepson 

15% 

2.21° 

3.06° 

9.87° 

14.74° 

Heeger  (level  1) 

7.55° 

19.64° 

Uras  et  al. 

8.7% 

2.16° 

3.05° 

3.22° 

8.92° 

Lucas  and  Kanade 

7.4% 

2.14° 

3.03° 

20.05° 

23.23° 

Waxman  et  al. 

2.4% 

1.91° 

2.12° 

12.93° 

15.36° 

Heeger  (level  2) 

6.4  Moon  landing 

The  Moon  landing  sequence  (Fig  13)  is  generated  by  gradually  rotating  and  expanding  a picture 
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Fig  13.  Moon  landing  sequence 


of  the  surface  of  the  moon.  Visually,  it  is  a bird’s-eye  view  of  the  moon  from  a spiral  landing 
spaceship.  The  purpose  of  this  sequence  is  to  demonstrate  our  algorithm’s  capability  to  handle 
complex  motion,  specifically,  expansion  plus  rotation.  Our  algorithm  used  a 21x21x7  window  and 
confidence  measure  since  there  is  no  motion  boundary.  Table  6 shows  that  our 

results  are  better  than  Fleet  and  Jepson’s  and  Lucas  & Kanade’s.  It  also  reveals  the  amount  of 
improvement  (10%-16%)  of  a generalized  motion  model  over  a uniform  translation  motion  model 
in  our  algorithm. 

Table  6:  Summary  of  Moon  landing  error  statistics 


Density 

Algorithms 

Average 

Error 

Standard 

Deviation 

Technique  by 

100% 

1.73° 

0.87° 

Our  algorithm  (Translation  + Rotation  +Expansion  model) 

1.91° 

0.89° 

Our  algorithm  (Translation  model) 

33.3% 

3.91° 

3.80° 

Lucas  and  Kanade 

1.37° 

0.71° 

Our  algorithm  (Translation  + Rotation  +Expansion  model) 

1.69° 

0.83° 

Our  algorithm  (Translation  model) 

31.1% 

2.47° 

1.71° 

Fleet  and  Jepson 

1.36° 

0.70° 

Our  algorithm  (Translation  + Rotation  +Expansion  model) 

1.68° 

0.82° 

Our  algorithm  (Translation  model) 

* Rotation  and  expansion  are  done  using  Khoros  1.5  vrotate  and  vresize  functions,  respectively. 
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Fig  1 4.3  Lucas  & Kanade’s  flow  field  (33.3%)  Fig  1 4.4  Fleet  & Jepson’s  flow  field  (31 . 1 %) 

6.5  Noise  sensitivity 

We  created  noisy  images  from  the  synthetic  sequences  used  above  and  tested  the  sensitivity  of  the 
algorithms  to  such  noise. 

The  sensitivity  analysis  is  motivated  by  simple  experiments  such  as  the  following:  On  a real-time 
image  processing  machine,  run  a temporal  differencing  algorithm  at  video  rate  on  successive 
frames  while  keeping  the  camera  and  the  scene  stationary.  Instead  of  getting  a uniform  output  of 
zero,  the  actual  output  always  contains  random  spots  of  non-zero  values.  This  kind  of  sensor  noise 
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violates  brightness  constancy  and  degrades  the  accuracy  of  any  optical  flow  algorithm. 


In  the  following  tables,  we  used  additive  Gaussian  noise  of  zero  mean  and  increasing  variance.  In 
order  to  conduct  a fair  comparison,  the  threshold  on  the  confidence  measure  is  fine-tuned  in  every 
single  run  so  that  the  output  density  is  always  50%.  We  chose  two  of  the  best  algorithms  in  [4] , 
Lucas  & Kanade  and  Fleet  & Jepson,  for  comparison.  For  the  noisy  Diverging  tree  sequence,  the 

noise  sensitivity  is  summarized  in  Table  5. 

Table  7:  Summary  of  Diverging  tree  noise  sensitivity  statistics 


Noise 

Standard 

Deviation 

Our  Algorithm 

Fleet  & Jepson 

Lucas  & Kanade 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

Average 

Error 

Standard 

Deviation 

0 

1.41° 

0.94° 

1.09° 

0.52° 

3.04° 

2.53° 

3 

1.64° 

1.08° 

1.18° 

0.61° 

3.28° 

2.77° 

6 

2.03° 

1.37° 

1.51° 

0.93° 

3.62° 

3.06° 

9 

2.53° 

2.17° 

2.15° 

1.78° 

4.32° 

3.79° 

12 

3.28° 

2.78° 

3.83° 

5.48° 

5.17° 

4.69° 

15 

3.87° 

3.15° 

9.23° 

12.04° 

5.93° 

5.41° 

Our  algorithm,  as  well  as  Lucas  & Kanade ’s  has  an  approximately  linear  error  increase  with  noise 
while  Fleet  & Jepson’s  has  quadratic  or  even  exponential  error  increase  (Fig  15).  Despite  its 


Noise  Sensitivity 


excellent  accuracy  for  noise-free  data.  Fleet  & Jepson’s  algorithm  is  outperformed  by  the  other 
two  when  the  image  sequence  becomes  noisy.  We  also  conducted  a similar  experiment  with  the 
Yosemite  fly-by  sequence.  Unfortunately,  Fleet  & Jepson’s  algorithm  does  not  generate  high 
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enough  density  data  when  the  sequence  becomes  noisy.  However,  we  can  once  again  confirm  the 


linear  error  with  respect  to  noise  for  Lucas  & Kanade’s  and  our  algorithm  (Fig  16). 

+ :Our  algorithm 
0 : Lucas  andKanade 


0 5 10 

Gaussian  Noise  Level 

Fig  16.  Noisy  Yosemite  fly-by 

Robustness  to  noise  is  a very  crucial  quality  for  a good  optical  flow  algorithm.  We  hope  this 
experiment  prompts  more  research  in  this  area.  Our  algorithm  achieved  robustness  by  integrating 
spatio-temporal  smoothing  in  the  3-D  Hermite  polynomial  differentiation  filter  theory. 

7.  Real  Images  Demonstration 

Current  optical  flow  algorithms  often  have  difficulty  with  real  image  sequences.  Our  algorithm 
performs  best  on  the  Yosemite  and  Moon  landing  sequences  because  these  sequences  model  real 
3-D  motion  and  are  complicated  enough  to  reveal  the  virtues  of  our  algorithm.  We  therefore  ex- 
pect it  to  perform  well  on  real  images.  Here  we  demonstrate  our  algorithm  with  two  real  image  se- 
quences, HMMWV  and  NASA.  The  HMMWV  sequence  was  taken  in  an  outdoor  environment 
with  a camera  mounted  on  a forward  moving  HMMWV  (High  Mobility  Multipurpose  Wheeled 
Vehicle).  It  was  later  stabilized  to  eliminate  unsteady  motion.  The  NASA  sequence  is  an  indoor 
diverging  scene  obtained  from  Barron  [4] . The  flow  outputs  for  our  algorithm  as  well  as  Lucas  & 
Kanade’s  and  Fleet  & Jepson’s  are  displayed  in  Fig  17  and  Fig  18.  For  our  algorithm,  the  output 
has  undergone  thresholding  based  on  two  confidence  measures,  | l/r|  and  . 
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In  the  HMMWV  sequence  , visual  inspection  shows  that  Lucas  & Kanade’s  flow  output  (Fig 
17.3)  is  very  noisy  because  of  random  velocity  change  in  a small  neighborhood.  Fleet  & Jepson’s 
flow  output  (Fig  17.4)  shows  no  indication  of  the  flow  field  divergence.  It  is  probable  that  the  par- 
ticular implementations  (provided  by  Barron)  of  these  two  algorithms  are  not  tuned  to  the  rela- 
tively large  flow  existing  in  this  sequence.  Our  algorithm,  on  the  other  hand,  produces  coherently 
diverging  flow  field  (Fig  17.2)  except  in  the  area  of  the  sky. 


Fig  17.1  HMMWV  sequence 
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Fig  17.2  Our  algorithm’s  flow  field  (64%  density) 


Fig  1 7.3  Lucas  & Kanade’s  flow  field  (48%) 


Fig  17.4  Fleet  & Jepson’s  flow  field  (34%) 


* G = 0.8  is  used  for  both  algorithms  because  only  10  image  frames  are  available.  The  filter  size  of  our 
algorithm  is  21x21x7. 
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In  the  NASA  sequence  , both  our  flow  (Fig  18.2)  and  Reet  & Jepson’s  flow  outputs  (Fig  18.4)  are 
very  good,  while  Lucas  & Kanade’s  algorithm  produces  a noisy  flow  field  (Fig  18.3).  Note  that 
our  output  density  is  twice  that  of  Reet  & Jepson’s  but  it  achieves  approximately  the  same  accura- 
cy visually.  If  Fleet  & Jepson  were  to  generate  the  same  density,  it  might  not  look  as  accurate. 


Fig  18.1  NASA  sequence  Fig  18.2  Our  algorithm’s  flow  field  (75%  density) 


Fig  18.3  Lucas  & Kanade’s  flow  field  (48%  density)  Fig  18.4  Fleet  & Jepson’s  flow  field  (37%  density) 


* (7  = 2.0  is  used  for  both  algorithms.  The  filter  size  of  our  algorithm  is  21x21x7. 
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Finally  we  apply  the  NASA  flow  field  outputs  from  these  three  algorithms  to  an  obstacle  detection 
algorithm  developed  by  Young,  et  al.[45]  [46]  [47] . This  algorithm  discriminates  between  obsta- 
cle and  non-obstacle  regions  in  the  image  using  only  the  perpendicular  component  of  flow  to  arbi- 
trarily chosen  image  lines.  In  the  following  figures,  a protrusion  or  a depression  represents  an 
obstacle  detected  by  the  algorithm. 

For  the  first  set  of  data,  we  select  the  horizontal  lines  from  220  to  260  (Fig  19.2).  Over  these  lines. 


Line  45 
Line  90 

Fig  19.1  NASA  image  lines  45  to  90 


Line  220 
Line  260 

Fig  19.2  NASA  image  lines  220  to  260 


there  is  a vertical  long  metal  plate  with  a hole  in  the  left  end  of  the  image  strip.  We  should  expect 
two  depressions  at  the  locations  of  the  plate.  The  detection  results  from  all  lines  are  averaged  and 
then  displayed  in  Fig  20.  In  Fig  20.1,  Lucas  & Kanade’s  flow  does  not  detect  the  metal  plate 


laage  position  x 


leaps  position  x 


Irage  position  x 


Rou.l:20  to  Roe_L260  RCU.L220  to  Roy.L2SD  Rc«.L22n  to  Rca_L260 

Fig  20.1  Lucas  & Kanade’s  results  Fig  20.2  Fleet  & Jepson’s  results  Fig  20.3  Our  algorithm’s  results 


clearly;  in  Fig  20.2,  Fleet  & Jepson’s  flow  detects  the  metal  plate  but  its  shape  is  hardly  recogniz- 
able; in  Fig  20.3,  our  algorithm  not  only  detects  the  plate  but  also  shows  its  shape  as  should  be  ex- 
pected. 
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For  the  second  set  of  data,  we  select  horizontal  lines  45  to  90  (Fig  19.1).  Over  these  lines,  there  is 
a pole  on  each  end  of  the  image  strip  and  a coke  can  at  the  center.  In  Fig  21.1,  Lucas  & Kanade’s 
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Fig  21 .1  Lucas  & Kanade’s  results  Fig  21 .2  Fleet  & Jepson’s  results  Fig  21 .3  Our  algorithm’s  results 

flow  does  not  make  out  meaningful  objects.  In  Fig  21.2,  Fleet  & Jepson’s  flow  barely  detects  the 
coke  can  and  the  right  pole,  and  does  not  detect  the  left  pole.  In  Fig  21.3,  our  flow  clearly  detects 
all  three  objects.  Note  that  for  this  particular  image  strip  we  used  dense  (100%)  flow  for  our  algo- 
rithm. 

8.  Conclusion 

Motion  estimation  is  difficult  and  ill-posed.  The  past  two  decades  of  research  have  led  to  spatio- 
temporal  filtering  techniques  to  overcome  sensor  noise,  brightness  change  over  time,  and  quanti- 
zation error.  The  aperture  problem  is  mitigated  by  increased  filter  support  or  other  global  tech- 
niques, while  other  approaches  attempt  to  use  an  affine  motion  model  to  pursue  better  accuracy  in 
the  optical  flow.  We  have  learned  from  these  results  and  have  developed  an  integrated  approach 
that  combines  a general  motion  model  and  3-D  Hermite  polynomial  differentiation  filters.  The 
general  model  for  arbitrary  3-D  motion  is  useful  for  all  motion  algorithms,  but  better  numerical 
techniques  are  required  to  make  good  use  of  the  model.  We  have  found  that  Hermite  polynomial 
theory  provides  necessary  advantages  for  this  purpose.  It  possesses  many  elegant  properties, 
including  orthogonality,  extensibility,  Gaussian  smoothing,  etc.  Contrary  to  general  belief,  the 
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behaviors  of  these  high  order  differentiation  filters  are  quite  insensitive  to  noise.  This  observation 
is  supported  by  the  good  results  in  our  noise  sensitivity  analysis.  Simplicity  adds  yet  another 
dimension  to  the  strength  of  this  algorithm,  making  real-time  implementation  feasible.  Although 
we  are  focusing  on  presenting  accurate  optical  flow  results,  our  algorithm  also  computes  all  the 
motion  parameters,  including  3-D  translation  and  rotation,  along  with  the  flow.  These  motion 
parameters  can  be  directly  utilized  for  other  motion  applications,  for  example,  computing  time-to- 
contact  with  y,  or  performing  derotation  (stabilization)  with  p (16).  These  applications,  however, 
are  limited  by  inherent  ambiguities  due  to  image  noisefl]  ,[44] ; on  the  other  hand,  a scheme  of 
integrating  or  propagating  motion  information  globally  is  restricted  by  occlusion  and  motion 
boundary  effects.  Our  future  studies  will  investigate  this  problem;  hopefully,  the  results  can  be 
integrated  into  the  current  work.  In  summary,  this  work  has  generalized  and  unified  several  previ- 
ous successful  theoretical  approaches  and  has  resulted  in  a versatile  and  flexible  algorithm. 
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Appendix.  A 

We  prove  Theorem  1 as  follows: 
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Proof:  The  first  equality  comes  from  the  orthogonality  of  { (jc)  } . We  now  prove  the  second 

equality,  which  claims  that  the  scalar  product  of  a function  and  the  ^h  Hermite  polynomial  is  equal 
to  the  scalar  product  of  the  Mi  derivative  of  the  function  and  1 . 


{I,  Hi)  = J G{x)I(x)H^{x)dx 

— oo 

oo 

= ^G(x)I(x)(-\)i^G-Hx)'^^^^dx 

~~oo 

= (-\)>^  \ I (x)'^!^^^dx 

— oo 

= tdI{x)d’^-'^G(x) 


dx 


,^,d'‘-^G(x) 

dx^~  ^ 


dx 


□ 


Appendix.  B 


Let  and  b,  defined  in  (34),  contain  no  noise  and  let  the  noise  be  modelled  as  in  (40).  Then 


-1 


E = As  + b = 0 and  5 = -{A^A  ) A^b. 

n ^ n n'  n 


(44) 


Let  the  new  optical  flow  be  s and  the  new  residual  be  E , and  assume  that  N « A^  and  Ab  « b ele- 


mentwise. Then 


S = -[  (A„  + Ar)^(A^  + A0]-’  (A„  + A0^(fc  + Afc)  ,and 
[ (A„  + AO  ^ (A„  + AO  ] -'  = (A JA„  [/  + (A JA„) (A JiV  + Af^A„)  ] ) -> 

= [/-  (AjA„)->  (AlN  + NTa^)]  (AJAJ-I 


(45) 


, SO 


(46) 
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- (AX)-'^>+  {AIN  + NTA^)  (AX)-UJfc+  {AlA^^Alb-  (AlA^)-^Aj;Ab 


Using  (44),  this  can  be  simplified  as  follows: 

5-5  - and  (AJAJ-1AJA^5-(AX)-1AJAZ7. 

For  the  residual,  substituting  s into  (40),  and  using  (44),  we  have 


iA+N)s-A„  (AjAJ  -^AlNs-A„  (AjAJ  -'AjAi  + b + Ab 

^ n ^ n ^ n n n ^ n n'  n 


(/-A„(AjAJ-iAj)  {Ns  + ^b) 


as  in  (42). 


To  understand  E better,  we  analyze  the  matrix  /- A^  (A^A^)  “^Aj,  denoted  by  T. 


It  is  easy  to  verify  that  the  only  nontrivial  eigenvalues  of  matrix  T is/are  1,  which  means  that  it 
maps  any  vector  (A^5  + AZ?)  to  only  the  directions  specified  by  the  eigenvectors  corresponding  to 
the  nontrivial  eigenvalues. 
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